The Constrained Laplacian Rank Algorithm for Graph-Based Clustering
نویسندگان
چکیده
Graph-based clustering methods perform clustering on a fixed input data graph. If this initial construction is of low quality then the resulting clustering may also be of low quality. Moreover, existing graph-based clustering methods require post-processing on the data graph to extract the clustering indicators. We address both of these drawbacks by allowing the data graph itself to be adjusted as part of the clustering procedure. In particular, our Constrained Laplacian Rank (CLR) method learns a graph with exactly k connected components (where k is the number of clusters). We develop two versions of this method, based upon the L1-norm and the L2-norm, which yield two new graph-based clustering objectives. We derive optimization algorithms to solve these objectives. Experimental results on synthetic datasets and real-world benchmark datasets exhibit the effectiveness of this new graph-based clustering method. Introduction State-of-the art clustering methods are often based on graphical representations of the relationships among data points. For example, spectral clustering (Ng, Jordan, and Weiss 2001), normalized cut (Shi and Malik 2000) and ratio cut (Hagen and Kahng 1992) all transform the data into a weighted, undirected graph based on pairwise similarities. Clustering is then accomplished by spectral or graphtheoretic optimization procedures. See (Ding and He 2005; Li and Ding 2006) for a discussion of the relations among these graph-based methods, and also the connections to nonnegative matrix factorization. All of these methods involve a two-stage process in which an data graph is formed from the data, and then various optimization procedures are invoked on this fixed input data graph. A disadvantage of this two-stage process is that the final clustering structures are not represented explicitly in the data graph (e.g., graph-cut methods often use K-means algorithm to post-process the ∗To whom all correspondence should be addressed. This work was partially supported by US NSF-IIS 1117965, NSFIIS 1302675, NSF-IIS 1344152, NSF-DBI 1356628, NIH R01 AG049371. Copyright c © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. results to get the clustering indicators); also, the clustering results are dependent on the quality of the input data graph (i.e., they are sensitive to the particular graph construction methods). It seems plausible that a strategy in which the optimization phase is allowed to change the data graph could have advantages relative to the two-phase strategy. In this paper we propose a novel graph-based clustering model that learns a graph with exactly k connected components (where k is the number of clusters). In our new model, instead of fixing the input data graph associated to the affinity matrix, we learn a new data similarity matrix that is a block diagonal matrix and has exactly k connected components—the k clusters. Thus, our new data similarity matrix is directly useful for the clustering task; the clustering results can be immediately obtained without requiring any post-processing to extract the clustering indicators. To achieve such ideal clustering structures, we impose a rank constraint on the Laplacian graph of the new data similarity matrix, thereby guaranteeing the existence of exactly k connected components. Considering both L2-norm and L1norm objectives, we propose two new clustering objectives and derive optimization algorithms to solve them. We also introduce a novel graph-construction method to initialize the graph associated with the affinity matrix. We conduct empirical studies on simulated datasets and seven real-world benchmark datasets to validate our proposed methods. The experimental results are promising— we find that our new graph-based clustering method consistently outperforms other related methods in most cases. Notation: Throughout the paper, all the matrices are written as uppercase. For a matrix M , the i-th row and the ij-th element of M are denoted by mi and mij , respectively. The trace of matrix M is denoted by Tr(M). The L2-norm of vector v is denoted by ‖v‖2, the Frobenius and the L1 norm of matrix M are denoted by ‖M‖F and ‖M‖1, respectively. New Clustering Formulations Graph-based clustering approaches typically optimize their objectives based on a given data graph associated with an affinity matrix A ∈ Rn×n (which can be symmetric or nonsymmetric), where n is the number of nodes (data points) in the graph. There are two drawbacks with these approaches: (1) the clustering performance is sensitive to the quality of the data graph construction; (2) the cluster structures are not explicit in the clustering results and a post-processing step is needed to uncover the clustering indicators. To address these two challenges, we aim to learn a new data graph S based on the given data graph A such that the new data graph is more suitable for the clustering task. In our strategy, we propose to learn a new data graph S that has exactly k connected components, where k is the number of clusters. In order to formulate a clustering objective based on this strategy, we start from the following theorem. If the affinity matrix A is nonnegative, then the Laplacian matrix LA = DA − (A + A)/2, where the degree matrix DA ∈ Rn×n is defined as a diagonal matrix whose i-th diagonal element is ∑ j(aij + aji)/2, has the following important property (Mohar 1991; Chung 1997): Theorem 1 The multiplicity k of the eigenvalue zero of the Laplacian matrix LA is equal to the number of connected components in the graph associated with A. Given a graph with affinity matrix A, Theorem 1 indicates that if rank(LA) = n − k, then the graph is an ideal graph based on which we already partition the data points into k clusters, without the need of performing K-means or other discretization procedures as is necessary with traditional graph-based clustering methods such as spectral clustering. Motivated by Theorem 1, given an initial affinity matrix A ∈ Rn×n, we learn a similarity matrix S ∈ Rn×n such that the corresponding Laplacian matrix LS = DS−(S+S)/2 is constrained to be rank(LS) = n − k. Under this constraint, the learned S is block diagonal with proper permutation, and thus we can directly partition the data points into k clusters based on S (Nie, Wang, and Huang 2014). To avoid the case that some rows of S are all zeros, we further constrain the S such that the sum of each row of S is one. Under these constraints, we learn that S that best approximates the initial affinity matrixA. Considering two different distances, the L2-norm and the L1-norm, between the given affinity matrix A and the learned similarity matrix S, we define the Constrained Laplacian Rank (CLR) for graph-based clustering as the solution to the following optimization problem: JCLR L2 = min ∑ j sij=1,sij≥0,rank(LS)=n−k ‖S −A‖2F (1) JCLR L1 = min ∑ j sij=1,sij≥0,rank(LS)=n−k ‖S −A‖1. (2) These problems seem very difficult to solve since LS = DS − (S +S)/2, and DS also depends on S, and the constraint rank(LS) = n−k is a complex nonlinear constraint. In the next section, we will propose novel and efficient algorithms to solve these problems. Optimization Algorithms Optimization Algorithm for Solving JCLR L2 in Eq. (1) Let σi(LS) denote the i-th smallest eigenvalue of LS . Note that σi(LS) ≥ 0 because LS is positive semidefinite. The problem (1) is equivalent to the following problem for a large enough value of λ: min ∑ j sij=1,sij≥0 ‖S −A‖2F + 2λ k ∑
منابع مشابه
Repeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملFast Spectral Clustering via the Nyström Method
We propose and analyze a fast spectral clustering algorithm with computational complexity linear in the number of data points that is directly applicable to large-scale datasets. The algorithm combines two powerful techniques in machine learning: spectral clustering algorithms and Nyström methods commonly used to obtain good quality low rank approximations of large matrices. The proposed algori...
متن کاملHyperspectral Image Classification Using Graph Clustering Methods
Hyperspectral imagery is a challenging modality due to the dimension of the pixels which can range from hundreds to over a thousand frequencies depending on the sensor. Most methods in the literature reduce the dimension of the data using a method such as principal component analysis, however this procedure can lose information. More recently methods have been developed to address classificatio...
متن کاملLaplacian regularized low rank subspace clustering
The problem of fitting a union of subspaces to a collection of data points drawn from multiple subspaces is considered in this paper. In the traditional low rank representation model, the dictionary used to represent the data points is chosen as the data points themselves and thus the dictionary is corrupted with noise. This problem is solved in the low rank subspace clustering model which deco...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016